Safer implementation of RepeatN #130887

Soveu · 2024-09-26T16:31:38Z

I've seen the "Use MaybeUninit for RepeatN" commit while reading This Week In Rust and immediately thought about something I've written some time ago - https://github.com/Soveu/repeat_finite/blob/master/src/lib.rs.

Using the fact, that Option will find niche in (T, NonZeroUsize), we can construct something that has the same size as (T, usize) while completely getting rid of MaybeUninit.
This leaves only unsafe on TrustedLen, which is pretty neat.

rustbot · 2024-09-26T16:31:46Z

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Soveu · 2024-09-26T17:34:49Z

huh, I'm no llvm expert, buuut...

          8: ; Function Attrs: mustprogress nofree norecurse nosync nounwind nonlazybind willreturn memory(argmem: readwrite) uwtable 
           9: define { i16, i16 } @iter_repeat_n_next(ptr noalias nocapture noundef align 8 dereferenceable(16) %it) unnamed_addr #0 { 
          10: start: 
          11:  %0 = load i64, ptr %it, align 8, !alias.scope !3, !noundef !6 
          12:  %1 = icmp eq i64 %0, 0 
          13:  br i1 %1, label %"_ZN106_$LT$core..iter..sources..repeat_n..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hd614a6386f3f6a14E.exit", label %bb8.i 
          14:  
          15: bb8.i: ; preds = %start 
          16:  %_10.0.i = add i64 %0, -1 
          17:  %2 = icmp eq i64 %_10.0.i, 0 
          18:  br i1 %2, label %bb5.i, label %bb3.i 
          19:  
          20: bb3.i: ; preds = %bb8.i 
          21:  store i64 %_10.0.i, ptr %it, align 8, !alias.scope !3 
  next:29      !~~~~~~~~~~~~~~~~~                                     error: match on wrong line
          22:  %_14.i = getelementptr inbounds i8, ptr %it, i64 8 
          23:  %_14.val.i = load i16, ptr %_14.i, align 8, !alias.scope !3, !noundef !6 
          24:  br label %"_ZN106_$LT$core..iter..sources..repeat_n..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hd614a6386f3f6a14E.exit" 
          25:  
          26: bb5.i: ; preds = %bb8.i 
          27:  %3 = getelementptr inbounds i8, ptr %it, i64 8 
          28:  %4 = load i16, ptr %3, align 8, !alias.scope !7 
          29:  store i64 0, ptr %it, align 8, !alias.scope !7 
          30:  br label %"_ZN106_$LT$core..iter..sources..repeat_n..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hd614a6386f3f6a14E.exit" 
          31:  
          32: "_ZN106_$LT$core..iter..sources..repeat_n..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hd614a6386f3f6a14E.exit": ; preds = %start, %bb3.i, %bb5.i 
          33:  %_0.sroa.5.0.i = phi i16 [ %4, %bb5.i ], [ %_14.val.i, %bb3.i ], [ undef, %start ] 
          34:  %_0.sroa.0.0.i = phi i16 [ 1, %bb5.i ], [ 1, %bb3.i ], [ 0, %start ] 
          35:  %5 = insertvalue { i16, i16 } poison, i16 %_0.sroa.0.0.i, 0 
          36:  %6 = insertvalue { i16, i16 } %5, i16 %_0.sroa.5.0.i, 1 
          37:  ret { i16, i16 } %6 
          38: }

bb3.i and bb5.i if not for the store i64 they would look identical.

The other functions (vec_extend_via_repeat_n and array_repeat_not_copy) look ok.

slanterns · 2024-10-15T12:59:47Z

Maybe you can ping scottmcm (test author) to see if the change in codegen result is acceptable (then adjust the test.)

Dylan-DPC · 2024-11-28T15:16:35Z

@Soveu any updates on the failed test? thanks

Soveu · 2024-11-29T17:37:05Z

@Soveu any updates on the failed test? thanks

I guess we don't want to introduce additional branches in the generated code, so for "production" code this implementation would be unfit. @scottmcm could comment on that.

scottmcm · 2024-11-29T21:08:08Z

It is pretty neat, and in fact it's the implementation I originally wrote for this.

The biggest issue I had with it then, though, is that it makes the iterator be potentially-undef, but because of the #130141 bug (which reminds me to go push on rust-lang/rfcs#3712 ) we're stuck with that problem anyway, so there's probably a way you could make this work.

Looks like the problem is that it's having trouble figuring out that the store of zero and the store of count-1 can be merged, because otherwise there'd be no difference.

Basically, what this test is testing is that for !needs_drop types we don't need to consider the last iteration separately, because the difference between them is whether we need to call drop, but since we don't need to do that for !needs_drop it's not an issue, and thus we should be able to avoid that second branch. (We of course need the "is the iterator empty?" branch, but once we know it's not empty there should be no further branches needed for !needs_drop types.)

Nothing immediately jumps to mind for how to get LLVM to know that, though. Maybe try a separate if const { mem::needs_drop::<T>() } path that could avoid the take, and see if that helps? Or maybe there's a way to phrase this such that you compute the value to write in a local, then there's just one write of it back to self so LLVM just needs to merge the values instead of merging the stores?

I'd suggest copying the implementation to godbolt and playing around a bit there until you find an incantation that works.

Soveu · 2024-11-29T21:47:15Z

I have tried some things when first writing this code, with no effect, but the idea with mem::needs_drop made me try again. I was able to make the branches very similar, making LLVM merge them :) Technically there is still a case, where clone is more expensive than move, but the type does not have drop(). Probably it doesn't happen often, but might need consideration.

    fn next(&mut self) -> Option<A> {
        let inner = self.inner.as_mut()?;
        let count = inner.count.get();

        if let Some(new_count) = NonZero::<usize>::new(count - 1) {
            let tmp = inner.element.clone();
            inner.count = new_count;
            return Some(tmp);
        }

        return if core::mem::needs_drop::<A>() {
            self.take_element()
        } else {
            let tmp = inner.element.clone();
            self.inner = None;
            Some(tmp)
        };
    }

Godbolt: https://rust.godbolt.org/z/Kr198x635

LLVM output:

define { i16, i16 } @iter_repeat_n_next(ptr noalias nocapture noundef align 8 dereferenceable(16) %it) unnamed_addr personality ptr @rust_eh_personality {
start:
  %0 = getelementptr inbounds i8, ptr %it, i64 8
  %1 = load i64, ptr %0, align 8
  %2 = icmp eq i64 %1, 0
  br i1 %2, label %"_ZN84_$LT$example..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hbed8c0f5d41a7103E.exit", label %bb13.i

bb13.i:
  %count.i = add i64 %1, -1
  %3 = load i16, ptr %it, align 8
  store i64 %count.i, ptr %0, align 8
  br label %"_ZN84_$LT$example..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hbed8c0f5d41a7103E.exit"

"_ZN84_$LT$example..RepeatN$LT$A$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$4next17hbed8c0f5d41a7103E.exit":
  %_0.sroa.6.0.i = phi i16 [ undef, %start ], [ %3, %bb13.i ]
  %_0.sroa.0.0.i = phi i16 [ 0, %start ], [ 1, %bb13.i ]
  %4 = insertvalue { i16, i16 } poison, i16 %_0.sroa.0.0.i, 0
  %5 = insertvalue { i16, i16 } %4, i16 %_0.sroa.6.0.i, 1
  ret { i16, i16 } %5
}

declare noundef range(i32 0, 10) i32 @rust_eh_personality(i32 noundef, i32 noundef range(i32 1, 17), i64 noundef, ptr noundef, ptr noundef) unnamed_addr #1

LLVM output for the current std implementation:

define { i16, i16 } @iter_repeat_n_next(ptr noalias nocapture noundef align 8 dereferenceable(16) %it) unnamed_addr {
start:
  %_3 = load i64, ptr %it, align 8
  %_2.not = icmp eq i64 %_3, 0
  br i1 %_2.not, label %bb4, label %bb1

bb1:
  %_2.i = add i64 %_3, -1
  store i64 %_2.i, ptr %it, align 8
  %0 = getelementptr inbounds i8, ptr %it, i64 8
  %result.i = load i16, ptr %0, align 8
  br label %bb4

bb4:
  %_0.sroa.3.0 = phi i16 [ %result.i, %bb1 ], [ undef, %start ]
  %_0.sroa.0.0 = phi i16 [ 1, %bb1 ], [ 0, %start ]
  %1 = insertvalue { i16, i16 } poison, i16 %_0.sroa.0.0, 0
  %2 = insertvalue { i16, i16 } %1, i16 %_0.sroa.3.0, 1
  ret { i16, i16 } %2
}

From what I see, %0 = getelementptr inbounds i8, ptr %it, i64 8 gets moved to the top

Soveu · 2024-11-29T21:59:44Z

That is interesting, just switching variables made it work 🤔

    fn next(&mut self) -> Option<A> {
        let inner = self.inner.as_mut()?;
        let count = inner.count.get();

        if let Some(new_count) = NonZero::<usize>::new(count - 1) {
            // This must be in this order!
            let tmp = inner.element.clone();
            inner.count = new_count;
            return Some(tmp);
        }

        return self.take_element();
    }

scottmcm · 2024-11-29T22:01:53Z

just switching variables made it work 🤔

Definitely odd, but that's why we have codegen tests :P

scottmcm · 2024-11-29T22:35:19Z

From what I see, %0 = getelementptr inbounds i8, ptr %it, i64 8 gets moved to the top

What that means is that it ended up getting a different layout -- originally it was storing the count at address, but apparently it flipped the order.

I don't have a strong feeling about which order is better, but if you want you could put #[repr(C)] // not stable just keeps the layout consistent for codegen tests on RepeatNInner to make it give you the order you want.

library/core/src/iter/sources/repeat_n.rs

…epeatn, r=<try> Use `iter::repeat_n` to implement `Vec::extend_with` This replaces the `Vec::extend_with` manual implementation, which is used by `Vec::resize` and `Vec` `SpecFromElem`, with `iter::repeat_n`. I've compared the codegen output between: 1. the current `Vec::resize` impl 2. this branch 3. this branch + rust-lang#130887 3 gives the closest codegen output to 1, with some output improvements. 2 doesn't look good: https://rust.godbolt.org/z/Yrc83EhjY. May also help rust-lang#120050?

Soveu · 2024-11-30T23:12:34Z

@scottmcm @Amanieu its ready

Soveu · 2024-11-30T23:15:33Z

@rustbot label +S-waiting-on-review -S-waiting-on-author

…epeatn, r=<try> Use `iter::repeat_n` to implement `Vec::extend_with` This replaces the `Vec::extend_with` manual implementation, which is used by `Vec::resize` and `Vec` `SpecFromElem`, with `iter::repeat_n`. I've compared the codegen output between: 1. the current `Vec::resize` impl 2. this branch 3. this branch + rust-lang#130887 3 gives the closest codegen output to 1, with some output improvements. 2 doesn't look good: https://rust.godbolt.org/z/Yrc83EhjY. May also help rust-lang#120050? --- WARNING: DO NOT MERGE - in order to run the perf run in rust-lang#133662 (comment) this PR currently also contains commits from rust-lang#130887

paolobarbolini · 2024-12-01T11:38:50Z

FYI we're also playing around with this change at #133662 since Compiler Explorer showed that the two MRs combined made Vec::resize slightly better. We'll see how the perf run goes.

Folyd · 2024-12-01T17:12:31Z

Nice, this reminds me of my article on Rust Magazine: VecDeque::resize() optimization
.

fmease · 2025-06-09T19:49:25Z

@Soveu It looks like we now have a true failure, could you look at the failure, fix it if truly true and rebase, then we can try again? Thanks!

Soveu · 2025-06-09T21:29:29Z

uhh, another codegen failure...

Soveu · 2025-06-09T21:33:24Z

looks like load and store changed order, I'll fix it tommorow

Soveu · 2025-06-13T20:34:42Z

@bors2 try jobs=test-various

rust-bors · 2025-06-13T20:34:44Z

@Soveu: 🔑 Insufficient privileges: not in try users

Soveu · 2025-06-13T20:41:14Z

@rustbot ready

fmease · 2025-06-13T20:44:33Z

@bors2 try jobs=test-various

rust-bors · 2025-06-13T20:44:37Z

⌛ Trying commit 8f77681 with merge 18cda02…

To cancel the try build, run the command @bors2 try cancel.

Safer implementation of RepeatN I've seen the "Use MaybeUninit for RepeatN" commit while reading This Week In Rust and immediately thought about something I've written some time ago - https://github.com/Soveu/repeat_finite/blob/master/src/lib.rs. Using the fact, that `Option` will find niche in `(T, NonZeroUsize)`, we can construct something that has the same size as `(T, usize)` while completely getting rid of `MaybeUninit`. This leaves only `unsafe` on `TrustedLen`, which is pretty neat. try-job: test-various

rust-bors · 2025-06-13T22:08:10Z

☀️ Try build successful (CI)
Build commit: 18cda02 (18cda0292a94079d157183002a4da07f2a20c322, parent: 0d6ab209c525e276cbe7544cbd39a3c3619b6b18)

Soveu · 2025-06-14T06:29:40Z

@scottmcm LGTM

scottmcm · 2025-06-18T02:49:20Z

Sure, let's try it again...
@bors r+

bors · 2025-06-18T02:49:23Z

📌 Commit 8f77681 has been approved by scottmcm

It is now in the queue for this repository.

bors · 2025-06-18T03:18:13Z

⌛ Testing commit 8f77681 with merge 27733d4...

bors · 2025-06-18T06:17:02Z

☀️ Test successful - checks-actions
Approved by: scottmcm
Pushing 27733d4 to master...

github-actions · 2025-06-18T06:19:46Z

What is this?

This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 77ec48f (parent) -> 27733d4 (this PR)

Test differences

Show 6 test diffs

Stage 1

[codegen] tests/codegen/iter-repeat-n-trivial-drop.rs: pass -> ignore (ignored when randomizing layouts) (J1)

Stage 2

[codegen] tests/codegen/iter-repeat-n-trivial-drop.rs: pass -> ignore (ignored when randomizing layouts) (J0)

Additionally, 4 doctest diffs were found. These are ignored, as they are noisy.

Job group index

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard 27733d46d79f4eb92e240fbba502c43022665735 --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

dist-aarch64-apple: 4784.6s -> 6180.4s (29.2%)
dist-x86_64-apple: 9044.0s -> 6930.5s (-23.4%)
mingw-check-1: 1993.8s -> 1656.5s (-16.9%)
x86_64-rust-for-linux: 2916.1s -> 2503.9s (-14.1%)
x86_64-msvc-ext1: 7807.6s -> 6740.9s (-13.7%)
x86_64-gnu-llvm-20-1: 3704.4s -> 3249.4s (-12.3%)
x86_64-gnu-llvm-20-3: 6988.5s -> 6241.0s (-10.7%)
i686-gnu-1: 8073.9s -> 7233.0s (-10.4%)
i686-gnu-2: 6227.2s -> 5601.1s (-10.1%)
x86_64-gnu-stable: 7578.3s -> 6826.0s (-9.9%)

How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

rust-timer · 2025-06-18T16:13:09Z

Finished benchmarking commit (27733d4): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary -0.7%, secondary 1.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.9%	[0.9%, 0.9%]	1
Regressions ❌ (secondary)	1.5%	[1.5%, 1.5%]	1
Improvements ✅ (primary)	-2.3%	[-2.3%, -2.3%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.7%	[-2.3%, 0.9%]	2

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 692.662s -> 693.071s (0.06%)
Artifact size: 372.09 MiB -> 372.10 MiB (0.00%)

rustbot assigned Amanieu Sep 26, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 26, 2024

This comment has been minimized.

Sign in to view

Dylan-DPC added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 28, 2024

Soveu force-pushed the repeatn branch from 464f771 to ff05378 Compare November 29, 2024 22:08

This comment has been minimized.

Sign in to view

scottmcm reviewed Nov 29, 2024

View reviewed changes

library/core/src/iter/sources/repeat_n.rs Outdated Show resolved Hide resolved

paolobarbolini mentioned this pull request Nov 30, 2024

Use iter::repeat_n to implement Vec::extend_with #133662

Open

Soveu force-pushed the repeatn branch from ff05378 to 496db67 Compare November 30, 2024 21:36

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 30, 2024

paolobarbolini mentioned this pull request Dec 2, 2024

Vec::resize for bytes should be a single memset #120050

Draft

fmease added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 9, 2025

Soveu force-pushed the repeatn branch from c8d9389 to b69c7c9 Compare June 13, 2025 20:27

This comment has been minimized.

Sign in to view

100% safe implementation of RepeatN

8f77681

Soveu force-pushed the repeatn branch from b69c7c9 to 8f77681 Compare June 13, 2025 20:33

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jun 13, 2025

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 18, 2025

bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 18, 2025

bors merged commit 27733d4 into rust-lang:master Jun 18, 2025
11 checks passed

rustbot added this to the 1.89.0 milestone Jun 18, 2025

Safer implementation of RepeatN #130887

Safer implementation of RepeatN #130887

Conversation

Soveu commented Sep 26, 2024 • edited by fmease Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented Sep 26, 2024

Uh oh!

This comment has been minimized.

Soveu commented Sep 26, 2024

Uh oh!

slanterns commented Oct 15, 2024

Uh oh!

Dylan-DPC commented Nov 28, 2024

Uh oh!

Soveu commented Nov 29, 2024

Uh oh!

scottmcm commented Nov 29, 2024

Uh oh!

Soveu commented Nov 29, 2024

Uh oh!

Soveu commented Nov 29, 2024

Uh oh!

scottmcm commented Nov 29, 2024

Uh oh!

This comment has been minimized.

scottmcm commented Nov 29, 2024

Uh oh!

Uh oh!

Soveu commented Nov 30, 2024

Uh oh!

Soveu commented Nov 30, 2024

Uh oh!

paolobarbolini commented Dec 1, 2024

Uh oh!

Folyd commented Dec 1, 2024

Uh oh!

fmease commented Jun 9, 2025

Uh oh!

Soveu commented Jun 9, 2025

Uh oh!

Soveu commented Jun 9, 2025

Uh oh!

This comment has been minimized.

Soveu commented Jun 13, 2025

Uh oh!

rust-bors bot commented Jun 13, 2025

Uh oh!

Soveu commented Jun 13, 2025

Uh oh!

fmease commented Jun 13, 2025

Uh oh!

rust-bors bot commented Jun 13, 2025

Uh oh!

rust-bors bot commented Jun 13, 2025

Uh oh!

Soveu commented Jun 14, 2025

Uh oh!

scottmcm commented Jun 18, 2025

Uh oh!

bors commented Jun 18, 2025

Uh oh!

bors commented Jun 18, 2025

Uh oh!

bors commented Jun 18, 2025

Uh oh!

Uh oh!

github-actions bot commented Jun 18, 2025

Test differences

Stage 1

Stage 2

Job duration changes

Uh oh!

rust-timer commented Jun 18, 2025

Overall result: no relevant changes - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Binary size

Uh oh!

Soveu commented Sep 26, 2024 •

edited by fmease

Loading